9 research outputs found

    Speeding-up reinforcement learning through abstraction and transfer learning

    Get PDF
    We are interested in the following general question: is it pos-\ud sible to abstract knowledge that is generated while learning\ud the solution of a problem, so that this abstraction can ac-\ud celerate the learning process? Moreover, is it possible to\ud transfer and reuse the acquired abstract knowledge to ac-\ud celerate the learning process for future similar tasks? We\ud propose a framework for conducting simultaneously two lev-\ud els of reinforcement learning, where an abstract policy is\ud learned while learning of a concrete policy for the problem,\ud such that both policies are refined through exploration and\ud interaction of the agent with the environment. We explore\ud abstraction both to accelerate the learning process for an op-\ud timal concrete policy for the current problem, and to allow\ud the application of the generated abstract policy in learning\ud solutions for new problems. We report experiments in a\ud robot navigation environment that show our framework to\ud be effective in speeding up policy construction for practical\ud problems and in generating abstractions that can be used to\ud accelerate learning in new similar problems.This research was partially supported by FAPESP (2011/ 19280-8, 2012/02190-9, 2012/19627-0) and CNPq (311058/ 2011-6, 305395/2010-6

    AVALIAÇÃO DE POLÍTICAS ABSTRATAS NA TRANSFERÊNCIA DE CONHECIMENTO EM NAVEGAÇÃO ROBÓTICA

    Get PDF
    This paper presents a new approach to the problem of solving a new task by the use of previous knowledge acquired during the process of solving a similar task in the same domain, robot navigation. A new algorithm, Qab-Learning, is proposed to obtain the abstract policy that will guide the agent in the task of reaching a goal location from any other location in the environment, and this policy is compared to the policy derived from another  algorithm, ND-TILDE. The policies are applied in a number of different tasks in two environments. The  results show that the policies, even after the process of abstraction, present a positive impact on the performance of the agent. COMPARISON AND EVALUATION OF ABSTRACT POLICIES FOR TRANSFER LEARNING IN ROBOT NAVIGATION TASKSEste paper apresenta uma análise comparativa do desempenho de políticas abstratas na reutilização de conhecimento no domínio de navegação robótica. O algoritmo  Qab-Learning é proposto para a obtenção dessas políticas, que são então comparadas a poíticas abstratas geradas por um algoritmo presente na literatura, o ND-TILDE, na solução de diferentes tarefas de navegação robótica. O presente trabalho demonstra que essas políticas, mesmo após o processo de abstração, apresentam um impacto positivo no desempenho do agente

    Omecamtiv mecarbil in chronic heart failure with reduced ejection fraction, GALACTIC‐HF: baseline characteristics and comparison with contemporary clinical trials

    Get PDF
    Aims: The safety and efficacy of the novel selective cardiac myosin activator, omecamtiv mecarbil, in patients with heart failure with reduced ejection fraction (HFrEF) is tested in the Global Approach to Lowering Adverse Cardiac outcomes Through Improving Contractility in Heart Failure (GALACTIC‐HF) trial. Here we describe the baseline characteristics of participants in GALACTIC‐HF and how these compare with other contemporary trials. Methods and Results: Adults with established HFrEF, New York Heart Association functional class (NYHA) ≥ II, EF ≤35%, elevated natriuretic peptides and either current hospitalization for HF or history of hospitalization/ emergency department visit for HF within a year were randomized to either placebo or omecamtiv mecarbil (pharmacokinetic‐guided dosing: 25, 37.5 or 50 mg bid). 8256 patients [male (79%), non‐white (22%), mean age 65 years] were enrolled with a mean EF 27%, ischemic etiology in 54%, NYHA II 53% and III/IV 47%, and median NT‐proBNP 1971 pg/mL. HF therapies at baseline were among the most effectively employed in contemporary HF trials. GALACTIC‐HF randomized patients representative of recent HF registries and trials with substantial numbers of patients also having characteristics understudied in previous trials including more from North America (n = 1386), enrolled as inpatients (n = 2084), systolic blood pressure < 100 mmHg (n = 1127), estimated glomerular filtration rate < 30 mL/min/1.73 m2 (n = 528), and treated with sacubitril‐valsartan at baseline (n = 1594). Conclusions: GALACTIC‐HF enrolled a well‐treated, high‐risk population from both inpatient and outpatient settings, which will provide a definitive evaluation of the efficacy and safety of this novel therapy, as well as informing its potential future implementation

    Transferência relacional entre tarefas de aprendizado por reforço via políticas abstratas.

    No full text
    When designing intelligent agents that must solve sequential decision problems, often we do not have enough knowledge to build a complete model for the problems at hand. Reinforcement learning enables an agent to learn behavior by acquiring experience through trial-and-error interactions with the environment. However, knowledge is usually built from scratch and learning the optimal policy may take a long time. In this work, we improve the learning performance by exploring transfer learning; that is, the knowledge acquired in previous source tasks is used to accelerate learning in new target tasks. If the tasks present similarities, then the transferred knowledge guides the agent towards faster learning. We explore the use of a relational representation that allows description of relationships among objects. This representation simplifies the use of abstraction and the extraction of the similarities among tasks, enabling the generalization of solutions that can be used across different, but related, tasks. This work presents two model-free algorithms for online learning of abstract policies: AbsSarsa(λ) and AbsProb-RL. The former builds a deterministic abstract policy from value functions, while the latter builds a stochastic abstract policy through direct search on the space of policies. We also propose the S2L-RL agent architecture, containing two levels of learning: an abstract level and a ground level. The agent simultaneously builds a ground policy and an abstract policy; not only the abstract policy can accelerate learning on the current task, but also it can guide the agent in a future task. Experiments in a robotic navigation environment show that these techniques are effective in improving the agents learning performance, especially during the early stages of the learning process, when the agent is completely unaware of the new task.Na construção de agentes inteligentes para a solução de problemas de decisão sequenciais, o uso de aprendizado por reforço é necessário quando o agente não possui conhecimento suficiente para construir um modelo completo do problema. Entretanto, o aprendizado de uma política ótima é em geral muito lento pois deve ser atingido através de tentativa-e-erro e de repetidas interações do agente com o ambiente. Umas das técnicas para se acelerar esse processo é possibilitar a transferência de aprendizado, ou seja, utilizar o conhecimento adquirido para se resolver tarefas passadas no aprendizado de novas tarefas. Assim, se as tarefas tiverem similaridades, o conhecimento prévio guiará o agente para um aprendizado mais rápido. Neste trabalho é explorado o uso de uma representação relacional, que explicita relações entre objetos e suas propriedades. Essa representação possibilita que se explore abstração e semelhanças estruturais entre as tarefas, possibilitando a generalização de políticas de ação para o uso em tarefas diferentes, porém relacionadas. Este trabalho contribui com dois algoritmos livres de modelo para construção online de políticas abstratas: AbsSarsa(λ) e AbsProb-RL. O primeiro constrói uma política abstrata determinística através de funções-valor, enquanto o segundo constrói uma política abstrata estocástica através de busca direta no espaço de políticas. Também é proposta a arquitetura S2L-RL para o agente, que possui dois níveis de aprendizado: o nível abstrato e o nível concreto. Uma política concreta é construída simultaneamente a uma política abstrata, que pode ser utilizada tanto para guiar o agente no problema atual quanto para guiá-lo em um novo problema futuro. Experimentos com tarefas de navegação robótica mostram que essas técnicas são efetivas na melhoria do desempenho do agente, principalmente nas fases inicias do aprendizado, quando o agente desconhece completamente o novo problema

    RAFT polymerization to form stimuli-responsive polymers

    No full text

    Transfer Learning for Multiagent Reinforcement Learning Systems

    No full text

    Cardiac myosin activation with omecamtiv mecarbil in systolic heart failure

    No full text
    BACKGROUND The selective cardiac myosin activator omecamtiv mecarbil has been shown to improve cardiac function in patients with heart failure with a reduced ejection fraction. Its effect on cardiovascular outcomes is unknown. METHODS We randomly assigned 8256 patients (inpatients and outpatients) with symptomatic chronic heart failure and an ejection fraction of 35% or less to receive omecamtiv mecarbil (using pharmacokinetic-guided doses of 25 mg, 37.5 mg, or 50 mg twice daily) or placebo, in addition to standard heart-failure therapy. The primary outcome was a composite of a first heart-failure event (hospitalization or urgent visit for heart failure) or death from cardiovascular causes. RESULTS During a median of 21.8 months, a primary-outcome event occurred in 1523 of 4120 patients (37.0%) in the omecamtiv mecarbil group and in 1607 of 4112 patients (39.1%) in the placebo group (hazard ratio, 0.92; 95% confidence interval [CI], 0.86 to 0.99; P = 0.03). A total of 808 patients (19.6%) and 798 patients (19.4%), respectively, died from cardiovascular causes (hazard ratio, 1.01; 95% CI, 0.92 to 1.11). There was no significant difference between groups in the change from baseline on the Kansas City Cardiomyopathy Questionnaire total symptom score. At week 24, the change from baseline for the median N-terminal pro-B-type natriuretic peptide level was 10% lower in the omecamtiv mecarbil group than in the placebo group; the median cardiac troponin I level was 4 ng per liter higher. The frequency of cardiac ischemic and ventricular arrhythmia events was similar in the two groups. CONCLUSIONS Among patients with heart failure and a reduced ejection, those who received omecamtiv mecarbil had a lower incidence of a composite of a heart-failure event or death from cardiovascular causes than those who received placebo. (Funded by Amgen and others; GALACTIC-HF ClinicalTrials.gov number, NCT02929329; EudraCT number, 2016 -002299-28.)

    Rivaroxaban with or without aspirin in stable cardiovascular disease

    No full text
    BACKGROUND: We evaluated whether rivaroxaban alone or in combination with aspirin would be more effective than aspirin alone for secondary cardiovascular prevention. METHODS: In this double-blind trial, we randomly assigned 27,395 participants with stable atherosclerotic vascular disease to receive rivaroxaban (2.5 mg twice daily) plus aspirin (100 mg once daily), rivaroxaban (5 mg twice daily), or aspirin (100 mg once daily). The primary outcome was a composite of cardiovascular death, stroke, or myocardial infarction. The study was stopped for superiority of the rivaroxaban-plus-aspirin group after a mean follow-up of 23 months. RESULTS: The primary outcome occurred in fewer patients in the rivaroxaban-plus-aspirin group than in the aspirin-alone group (379 patients [4.1%] vs. 496 patients [5.4%]; hazard ratio, 0.76; 95% confidence interval [CI], 0.66 to 0.86; P<0.001; z=−4.126), but major bleeding events occurred in more patients in the rivaroxaban-plus-aspirin group (288 patients [3.1%] vs. 170 patients [1.9%]; hazard ratio, 1.70; 95% CI, 1.40 to 2.05; P<0.001). There was no significant difference in intracranial or fatal bleeding between these two groups. There were 313 deaths (3.4%) in the rivaroxaban-plus-aspirin group as compared with 378 (4.1%) in the aspirin-alone group (hazard ratio, 0.82; 95% CI, 0.71 to 0.96; P=0.01; threshold P value for significance, 0.0025). The primary outcome did not occur in significantly fewer patients in the rivaroxaban-alone group than in the aspirin-alone group, but major bleeding events occurred in more patients in the rivaroxaban-alone group. CONCLUSIONS: Among patients with stable atherosclerotic vascular disease, those assigned to rivaroxaban (2.5 mg twice daily) plus aspirin had better cardiovascular outcomes and more major bleeding events than those assigned to aspirin alone. Rivaroxaban (5 mg twice daily) alone did not result in better cardiovascular outcomes than aspirin alone and resulted in more major bleeding events
    corecore